Create a Rule Set: Sintelix Extension
Background
Before creating a Rule Set, it is helpful to understand the key concepts behind Gold Standard testing (Concept: Harvester Gold Standard).
Effective Rule Sets
Creating a Rule Set is the first stage in the process of establishing an effective Rule Set.
There are three stages to establishing an effective rule set.
Video: Create a New Rule Set
Click on the image below to view a video. The video uses the Sintelix Extension to create a new rule set, add another document to the Gold Standard Collection, and finally it shows you to create and modify rules to get a perfect score.
Recommended Process
The Sintelix Extension is designed to make creating Rule Sets and harvesting a Gold Standard Collection easy. It allows you to quickly collect representative Gold Standard documents and create most of the rules you need for the Rule Set.
Using the Sintelix Extension to create a new Rule Set you can:
-
browse to a web page representative of the pages you want the Rule Set to harvest
-
select the Sintelix Extension icon
, and select New Rule Set.
-
follow the procedure below to create a new Rule Set, ensuring you keep the Create gold standard from this page checkbox selected
-
when prompted, click to select and unselect elements on the web page to harvest and then select the Add to Gold Standard button, which will add your first document to the Gold Standard Collection.
-
go to Sintelix, select Configurations > Harvester Rule Sets, select the newly created rule set and select the newly added document to view the Full Page document - click on each coloured element to add the rule to the rule set. See Evaluate and Modify the Rule Set.
-
navigate to another representative web page and use the Sintelix Extension
to manually harvest another page. See Harvest via Sintelix Extension.
Make sure you check the advanced checkbox feature so you can:
-
apply the newly create Rule Set
-
select any additional elements not found in the previously harvested document(s)
-
keep the Full Page checkbox checked
-
select Harvest to collect the Gold Standard document and the Full Page document
-
repeat this step until you are finding no new elements on the web pages (for example, you may want to collect 5-10 documents initially)
-
-
in Sintelix, select select Configurations > Harvester Rule Sets so you can now evaluate and modify Rule Set. See Evaluate and Modify the Rule Set.
-
once you have achieved the best evaluation score, test the Rule Set by running a harvest query (for example, Harvest via URLs) to add more documents to your Gold Standard Collection, ensuring the Add Full Document checkbox is selected.
-
in Configurations > Harvester Rule Sets evaluate and modify Rule Set using the larger collection sample. See Evaluate and Modify the Rule Set.
Create a New Rule Set
To create a new Rule Set you can:
-
use the Sintelix Extension
which provides a wizard to guide you through the process (recommended).
Benefit: Creating a Rule Set using the Sintelix Extension allows you to harvest your first Gold Standard document ready to develop the first rules for the Rule Set.
-
select Configurations > Harvester Rule Sets to create, copy or import a pre-existing Rule Set (see Manage Rule Sets).
Creating a Rule Set from the Harvest Rule Sets configuration creates an empty Rule Set. The next step is to add Gold Standard documents to the linked Gold Standard collection and then to create the rules for harvesting.
Once a Rule Set has been created, it can only be modified through the Configurations > Harvester Rule Sets interface. See Evaluate and Modify the Rule Set.
Sintelix Extension: Create a New Rule Set
-
On the page you want to harvest, click the Sintelix Extension icon
, and select New Rule Set.
Result: The Make a new rule set for this site dialog is displayed.
-
Change the project name, if required.
-
Change the Rule Set Name, as required.
-
Modify the settings as required. These can be modified later. See Rule Set Configuration Settings.
-
Leave the Create gold standard from this page checked (recommended).
-
If you left the Create gold standard from this page checked:
-
Result: The following dialog is displayed.
-
Select the elements on the page in the same way you do when performing a manual harvest using the Sintelix Extension (Harvest via Sintelix Extension).
-
Select
when finished selecting the required elements.Result: The following message is displayed.
-
Select the
button to open the Rule Set in Sintelix, or select the x to close the dialog.
-
-
If you unchecked the Create gold standard from this page:
-
Result: The following dialog is displayed.
-
Select Edit your new Rule Set to open the Rule Set in Sintelix, or select Close to close the dialog.
-